垂直分布式学习利用了多个学习工人收集的本地特征,以形成更好的全球模型。但是,工人与模型聚合器之间的数据交换进行参数培训会导致沉重的沟通负担,尤其是当学习系统建立在容量受限的无线网络上时。在本文中,我们提出了一个新型的层次分布式学习框架,每个工人分别学习了其本地观察到的数据的低维嵌入。然后,他们执行沟通有效的分布式最大 - 以有效地将合成的输入传输到聚合器。对于通过共享无线通道进行的数据交换,我们提出了一个基于机会性载体传感的协议,以实现所有学习工人的输出数据的最大功能操作。我们的仿真实验表明,提出的学习框架能够使用学习工人的所有原始输出的串联来实现与学习模型几乎相同的模型精度,同时需要独立于工人数量的沟通负载。
translated by 谷歌翻译
在多任务学习(MTL)中,对联合模型进行了培训,可以同时对几个任务进行预测。联合培训降低了计算成本并提高数据效率;但是,由于这些不同任务的梯度可能需要冲突,因此训练MTL的联合模型通常比其相应的单任务对应人员产生的性能较低。减轻此问题的一种常见方法是使用特定的启发式方法将每个任务梯度组合到联合更新方向上。在本文中,我们建议将梯度组合步骤视为一个议价游戏,在该游戏中,任务就达成了有关参数更新联合方向的协议。在某些假设下,议价问题具有独特的解决方案,称为NASH讨价还价解决方案,我们建议将其用作多任务学习的原则方法。我们描述了一种新的MTL优化程序NASH-MTL,并为其收敛性得出了理论保证。从经验上讲,我们表明NASH-MTL在各个域中的多个MTL基准上实现了最新的结果。
translated by 谷歌翻译
Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones. Unfortunately, TDOA estimation is very sensitive to reverberation and additive noise. We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements. Our experimental setup includes a microphone array that measures the transmitted sound source at multiple locations across the acoustic enclosure. We demonstrate that LOCA learns a representation that is isometric to the spatial locations of the microphones. The performance of our method is evaluated using a series of realistic simulations and compared with other dimensionality-reduction schemes. We further assess the influence of reverberation on the results of LOCA and show that it demonstrates considerable robustness.
translated by 谷歌翻译
Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoder-only models. While the encoder-only architecture is beneficial for classification tasks, it does not cater well for sub-word prediction tasks, such as Named Entity Recognition, when considering the morphologically rich nature of Hebrew. In this paper we argue that sequence-to-sequence generative architectures are more suitable for LLMs in the case of morphologically rich languages (MRLs) such as Hebrew. We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder. Using this approach, our experiments show substantial improvements over previously published results on existing Hebrew NLP benchmarks. These results suggest that multilingual sequence-to-sequence models present a promising building block for NLP for MRLs.
translated by 谷歌翻译
In this short paper, we present our ongoing work on the veriFIRE project -- a collaboration between industry and academia, aimed at using verification for increasing the reliability of a real-world, safety-critical system. The system we target is an airborne platform for wildfire detection, which incorporates two deep neural networks. We describe the system and its properties of interest, and discuss our attempts to verify the system's consistency, i.e., its ability to continue and correctly classify a given input, even if the wildfire it describes increases in intensity. We regard this work as a step towards the incorporation of academic-oriented verification tools into real-world systems of interest.
translated by 谷歌翻译
The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF), such as quantization step size, preconditioning factors, bias fixing, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4-bit weight quantization results on-par with SoTA within PTQ constraints of speed and resource.
translated by 谷歌翻译
Training a generative model on a single image has drawn significant attention in recent years. Single image generative methods are designed to learn the internal patch distribution of a single natural image at multiple scales. These models can be used for drawing diverse samples that semantically resemble the training image, as well as for solving many image editing and restoration tasks that involve that particular image. Here, we introduce an extended framework, which allows to simultaneously learn the internal distributions of several images, by using a single model with spatially varying image-identity conditioning. Our BlendGAN opens the door to applications that are not supported by single-image models, including morphing, melding, and structure-texture fusion between two or more arbitrary images.
translated by 谷歌翻译
Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e.g., a Wikipedia passage) given to the model to generate a grounded answer. Having these two sources of knowledge entangled together is a core issue for generative QA models as it is unclear whether the answer stems from the given non-parametric knowledge or not. This unclarity has implications on issues of trust, interpretability and factuality. In this work, we propose a new paradigm in which QA models are trained to disentangle the two sources of knowledge. Using counterfactual data augmentation, we introduce a model that predicts two answers for a given question: one based on given contextual knowledge and one based on parametric knowledge. Our experiments on the Natural Questions dataset show that this approach improves the performance of QA models by making them more robust to knowledge conflicts between the two knowledge sources, while generating useful disentangled answers.
translated by 谷歌翻译
视觉问题回答(VQA)主要通过英语镜头进行了研究。但是,以其他方式以其他方式处理VQA将需要大量资源。在本文中,我们在数据和建模方面提出了多种语言视觉问题回答(MVQA)的可扩展解决方案。我们首先向MVQA数据生成提出了一个基于翻译的框架,该框架比直接收集问题和答案的常规方法所需的人类注释工作要少得多。然后,我们将框架应用于CrossModal-3600数据集中的多语言字幕,并开发了有效的注释协议,以创建Maverics-XM3600(MAXM),这是一种仅使用7种不同语言的仅测试的VQA基准。最后,我们提出了一种方法,用于统一,可扩展,开放式和端到端MVQA建模,并在13种语言中表现出强劲的性能。
translated by 谷歌翻译
任何稀疏编码方法的最终目标是从几个嘈杂的线性测量值(一个未知的稀疏向量)中准确恢复。不幸的是,这个估计问题通常是NP-HARD,因此始终采用近似方法(例如Lasso或正交匹配的追踪)来接近它,从而使准确性以较小的计算复杂性进行了交易。在本文中,我们为稀疏编码开发了一种量子启发的算法,前提是,与经典近似方法相比,量子计算机和ISING机器的出现可能会导致更准确的估计。为此,我们将最一般的稀疏编码问题作为二次不受约束的二进制优化(QUBO)任务提出,可以使用量子技术有效地最小化。为了在旋转数量(空间复杂性)方面也有效地得出QUBO模型,我们将分析分为三种不同的情况。这些由表达基础稀疏向量所需的位数来定义:二进制,2位和一般的定点表示。我们使用有关Lightsolver量子启发的数字平台的模拟数据进行数值实验,以验证我们的QUBO公式的正确性,并证明其优于基线方法的优势。
translated by 谷歌翻译